82 research outputs found

    Enhancing Network Initialization for Medical AI Models Using Large-Scale, Unlabeled Natural Images

    Full text link
    Pre-training datasets, like ImageNet, have become the gold standard in medical image analysis. However, the emergence of self-supervised learning (SSL), which leverages unlabeled data to learn robust features, presents an opportunity to bypass the intensive labeling process. In this study, we explored if SSL for pre-training on non-medical images can be applied to chest radiographs and how it compares to supervised pre-training on non-medical images and on medical images. We utilized a vision transformer and initialized its weights based on (i) SSL pre-training on natural images (DINOv2), (ii) SL pre-training on natural images (ImageNet dataset), and (iii) SL pre-training on chest radiographs from the MIMIC-CXR database. We tested our approach on over 800,000 chest radiographs from six large global datasets, diagnosing more than 20 different imaging findings. Our SSL pre-training on curated images not only outperformed ImageNet-based pre-training (P<0.001 for all datasets) but, in certain cases, also exceeded SL on the MIMIC-CXR dataset. Our findings suggest that selecting the right pre-training strategy, especially with SSL, can be pivotal for improving artificial intelligence (AI)'s diagnostic accuracy in medical imaging. By demonstrating the promise of SSL in chest radiograph analysis, we underline a transformative shift towards more efficient and accurate AI models in medical imaging

    Empowering Clinicians and Democratizing Data Science: Large Language Models Automate Machine Learning for Clinical Studies

    Full text link
    A knowledge gap persists between Machine Learning (ML) developers (e.g., data scientists) and practitioners (e.g., clinicians), hampering the full utilization of ML for clinical data analysis. We investigated the potential of the chatGPT Advanced Data Analysis (ADA), an extension of GPT-4, to bridge this gap and perform ML analyses efficiently. Real-world clinical datasets and study details from large trials across various medical specialties were presented to chatGPT ADA without specific guidance. ChatGPT ADA autonomously developed state-of-the-art ML models based on the original study's training data to predict clinical outcomes such as cancer development, cancer progression, disease complications, or biomarkers such as pathogenic gene sequences. Strikingly, these ML models matched or outperformed their published counterparts. We conclude that chatGPT ADA offers a promising avenue to democratize ML in medicine, making advanced analytics accessible to non-ML experts and promoting broader applications in medical research and practice

    Collaborative Training of Medical Artificial Intelligence Models with non-uniform Labels

    Full text link
    Artificial intelligence (AI) methods are revolutionizing medical image analysis. However, robust AI models require large multi-site datasets for training. While multiple stakeholders have provided publicly available datasets, the ways in which these data are labeled differ widely. For example, one dataset of chest radiographs might contain labels denoting the presence of metastases in the lung, while another dataset of chest radiograph might focus on the presence of pneumonia. With conventional approaches, these data cannot be used together to train a single AI model. We propose a new framework that we call flexible federated learning (FFL) for collaborative training on such data. Using publicly available data of 695,000 chest radiographs from five institutions - each with differing labels - we demonstrate that large and heterogeneously labeled datasets can be used to train one big AI model with this framework. We find that models trained with FFL are superior to models that are trained on matching annotations only. This may pave the way for training of truly large-scale AI models that make efficient use of all existing data.Comment: 2 figures, 3 tables, 5 supplementary table

    Polyphonic sonification of electrocardiography signals for diagnosis of cardiac pathologies

    Get PDF
    Kather JN, Hermann T, Bukschat Y, Kramer T, Schad LR, Zöllner FG. Polyphonic sonification of electrocardiography signals for diagnosis of cardiac pathologies. Scientific Reports. 2017;7(1): 44549.Electrocardiography (ECG) data are multidimensional temporal data with ubiquitous applications in the clinic. Conventionally, these data are presented visually. It is presently unclear to what degree data sonification (auditory display), can enable the detection of clinically relevant cardiac pathologies in ECG data. In this study, we introduce a method for polyphonic sonification of ECG data, whereby different ECG channels are simultaneously represented by sound of different pitch. We retrospectively applied this method to 12 samples from a publicly available ECG database. We and colleagues from our professional environment then analyzed these data in a blinded. Based on these analyses, we found that the sonification technique can be intuitively understood after a short training session. On average, the correct classification rate for observers trained in cardiology was 78%, compared to 68% and 50% for observers not trained in cardiology or not trained in medicine at all, respectively. These values compare to an expected random guessing performance of 25%. Strikingly, 27% of all observers had a classification accuracy over 90%, indicating that sonification can be very successfully used by talented individuals. These findings can serve as a baseline for potential clinical applications of ECG sonification

    Automatic evaluation of tumor budding in immunohistochemically stained colorectal carcinomas and correlation to clinical outcome

    Get PDF
    Background: Tumor budding, meaning a detachment of tumor cells at the invasion front of colorectal carcinoma (CRC) into single cells or clusters (&lt;=5 tumor cells), has been shown to correlate to an inferior clinical outcome by several independent studies. Therefore, it has been discussed as a complementary prognostic factor to the TNM staging system, and it is already included in national guidelines as an additional prognostic parameter. However, its application by manual evaluation in routine pathology is hampered due to the use of several slightly different assessment systems, a time-consuming manual counting process and a high inter-observer variability. Hence, we established and validated an automatic image processing approach to reliably quantify tumor budding in immunohistochemically (IHC) stained sections of CRC samples. Methods: This approach combines classical segmentation methods (like morphological operations) and machine learning techniques (k-means and hierarchical clustering, convolutional neural networks) to reliably detect tumor buds in colorectal carcinoma samples immunohistochemically stained for pan-cytokeratin. As a possible application, we tested it on whole-slide images as well as on tissue microarrays (TMA) from a clinically well-annotated CRC cohort. Results: Our automatic tumor budding evaluation tool detected the absolute number of tumor buds per image with a very good correlation to the manually segmented ground truth (R2 value of 0.86). Furthermore the automatic evaluation of whole-slide images from 20 CRC-patients, we found that neither the detected number of tumor buds at the invasion front nor the number in hotspots was associated with the nodal status. However, the number of spatial clusters of tumor buds (budding hotspots) significantly correlated to the nodal status (p-value = 0.003 for N0 vs. N1/N2). TMAs were not feasible for tumor budding evaluation, as the spatial relationship of tumor buds (especially hotspots) was not preserved. Conclusions: Automatic image processing is a feasible and valid assessment tool for tumor budding in CRC on whole-slide images. Interestingly, only the spatial clustering of the tumor buds in hotspots (and especially the number of hotspots) and not the absolute number of tumor buds showed a clinically relevant correlation with patient outcome in our data

    Adversarial attacks and adversarial robustness in computational pathology.

    Get PDF
    Artificial Intelligence (AI) can support diagnostic workflows in oncology by aiding diagnosis and providing biomarkers directly from routine pathology slides. However, AI applications are vulnerable to adversarial attacks. Hence, it is essential to quantify and mitigate this risk before widespread clinical use. Here, we show that convolutional neural networks (CNNs) are highly susceptible to white- and black-box adversarial attacks in clinically relevant weakly-supervised classification tasks. Adversarially robust training and dual batch normalization (DBN) are possible mitigation strategies but require precise knowledge of the type of attack used in the inference. We demonstrate that vision transformers (ViTs) perform equally well compared to CNNs at baseline, but are orders of magnitude more robust to white- and black-box attacks. At a mechanistic level, we show that this is associated with a more robust latent representation of clinically relevant categories in ViTs compared to CNNs. Our results are in line with previous theoretical studies and provide empirical evidence that ViTs are robust learners in computational pathology. This implies that large-scale rollout of AI models in computational pathology should rely on ViTs rather than CNN-based classifiers to provide inherent protection against perturbation of the input data, especially adversarial attacks

    Fibroglandular Tissue Segmentation in Breast MRI using Vision Transformers -- A multi-institutional evaluation

    Full text link
    Accurate and automatic segmentation of fibroglandular tissue in breast MRI screening is essential for the quantification of breast density and background parenchymal enhancement. In this retrospective study, we developed and evaluated a transformer-based neural network for breast segmentation (TraBS) in multi-institutional MRI data, and compared its performance to the well established convolutional neural network nnUNet. TraBS and nnUNet were trained and tested on 200 internal and 40 external breast MRI examinations using manual segmentations generated by experienced human readers. Segmentation performance was assessed in terms of the Dice score and the average symmetric surface distance. The Dice score for nnUNet was lower than for TraBS on the internal testset (0.909±\pm0.069 versus 0.916±\pm0.067, P<0.001) and on the external testset (0.824±\pm0.144 versus 0.864±\pm0.081, P=0.004). Moreover, the average symmetric surface distance was higher (=worse) for nnUNet than for TraBS on the internal (0.657±\pm2.856 versus 0.548±\pm2.195, P=0.001) and on the external testset (0.727±\pm0.620 versus 0.584±\pm0.413, P=0.03). Our study demonstrates that transformer-based networks improve the quality of fibroglandular tissue segmentation in breast MRI compared to convolutional-based models like nnUNet. These findings might help to enhance the accuracy of breast density and parenchymal enhancement quantification in breast MRI screening

    Deep learning trained on lymph node status predicts outcome from gastric cancer histopathology: a retrospective multicentric study

    Get PDF
    Aim Gastric cancer (GC) is a tumor entity with highly variant outcomes. Lymph node metastasis is a prognostically adverse biomarker. We hypothesized that GC primary tissue contains information that is predictive of lymph node status and patient prognosis and that this information can be extracted using Deep Learning (DL). Methods Using three patient cohorts comprising 1146 patients, we trained and validated a DL system to predict lymph node status directly from hematoxylin-and-eosin stained GC tissue sections. We investigated the concordance between the DL-based prediction from the primary tumor slides (aiN score) and the histopathological lymph node status (pN). Furthermore, we assessed the prognostic value of the aiN score alone and when combined with the pN status. Results The aiN score predicted the pN status reaching Area Under the Receiver Operating Characteristic curves (AUROCs) of 0.71 in the training cohort and 0.69 and 0.65 in the two test cohorts. In a multivariate Cox analysis, the aiN score was an independent predictor of patient survival with Hazard Ratios (HR) of 1.5 in the training cohort and of 1.3 and 2.2 in the two test cohorts. A combination of the aiN score and the pN status prognostically stratified patients by survival with p-values <0.05 in log-rank tests. Conclusion GC primary tumor tissue contains additional prognostic information that is accessible using the aiN score. In combination with the pN status, this can be used for personalized management of gastric cancer patients after prospective validation

    Direct prediction of genetic aberrations from pathology images in gastric cancer with swarm learning.

    Get PDF
    BACKGROUND Computational pathology uses deep learning (DL) to extract biomarkers from routine pathology slides. Large multicentric datasets improve performance, but such datasets are scarce for gastric cancer. This limitation could be overcome by Swarm Learning (SL). METHODS Here, we report the results of a multicentric retrospective study of SL for prediction of molecular biomarkers in gastric cancer. We collected tissue samples with known microsatellite instability (MSI) and Epstein-Barr Virus (EBV) status from four patient cohorts from Switzerland, Germany, the UK and the USA, storing each dataset on a physically separate computer. RESULTS On an external validation cohort, the SL-based classifier reached an area under the receiver operating curve (AUROC) of 0.8092 (± 0.0132) for MSI prediction and 0.8372 (± 0.0179) for EBV prediction. The centralized model, which was trained on all datasets on a single computer, reached a similar performance. CONCLUSIONS Our findings demonstrate the feasibility of SL-based molecular biomarkers in gastric cancer. In the future, SL could be used for collaborative training and, thus, improve the performance of these biomarkers. This may ultimately result in clinical-grade performance and generalizability
    corecore